High performance automatic mispronunciation detection method based on neural network and TRAP features
نویسندگان
چکیده
In this paper, we propose a new approach to utilize temporal information and neural network (NN) to improve the performance of automatic mispronunciation detection (AMD). Firstly, the alignment results between speech signals and corresponding phoneme sequences are obtained within the classic GMM-HMM framework. Then, the long-time TempoRAl Patterns (TRAPs) [5] features are introduced to describe the pronunciation quality instead of the conventional spectral features (e.g. MFCC). Based on the phoneme boundaries and TRAPs features, we use Multi-layer Perceptron (MLP) to calculate the final posterior probability of each testing phoneme, and determine whether it is a mispronunciation or not by comparing with a phone dependent threshold. Moreover, we combine the TRAPs-MLP method with our existing methods to further improve the performance. Experiments show that the TRAPs-MLP method can give a significant relative improvement of 39.04% in EER (Equal Error Rate) reduction, and the fusion of TRAPs-MLP, GMM-UBM and GLDS-SVM [4] methods can yield 48.32% in EER reduction relatively, both compared with the baseline GMM-UBM method.
منابع مشابه
A multi-scale convolutional neural network for automatic cloud and cloud shadow detection from Gaofen-1 images
The reconstruction of the information contaminated by cloud and cloud shadow is an important step in pre-processing of high-resolution satellite images. The cloud and cloud shadow automatic segmentation could be the first step in the process of reconstructing the information contaminated by cloud and cloud shadow. This stage is a remarkable challenge due to the relatively inefficient performanc...
متن کاملNeural Network Performance Analysis for Real Time Hand Gesture Tracking Based on Hu Moment and Hybrid Features
This paper presents a comparison study between the multilayer perceptron (MLP) and radial basis function (RBF) neural networks with supervised learning and back propagation algorithm to track hand gestures. Both networks have two output classes which are hand and face. Skin is detected by a regional based algorithm in the image, and then networks are applied on video sequences frame by frame in...
متن کاملSUBCLASS FUZZY-SVM CLASSIFIER AS AN EFFICIENT METHOD TO ENHANCE THE MASS DETECTION IN MAMMOGRAMS
This paper is concerned with the development of a novel classifier for automatic mass detection of mammograms, based on contourlet feature extraction in conjunction with statistical and fuzzy classifiers. In this method, mammograms are segmented into regions of interest (ROI) in order to extract features including geometrical and contourlet coefficients. The extracted features benefit from...
متن کاملAn Effective Model for SMS Spam Detection Using Content-based Features and Averaged Neural Network
In recent years, there has been considerable interest among people to use short message service (SMS) as one of the essential and straightforward communications services on mobile devices. The increased popularity of this service also increased the number of mobile devices attacks such as SMS spam messages. SMS spam messages constitute a real problem to mobile subscribers; this worries telecomm...
متن کاملDimensionality Reduction and Improving the Performance of Automatic Modulation Classification using Genetic Programming (RESEARCH NOTE)
This paper shows how we can make advantage of using genetic programming in selection of suitable features for automatic modulation recognition. Automatic modulation recognition is one of the essential components of modern receivers. In this regard, selection of suitable features may significantly affect the performance of the process. Simulations were conducted with 5db and 10db SNRs. Test and ...
متن کامل